Patterns in UFO Sightings

Date: August 2, 2023

Name: Chanleakhana Thon

Introduction

Description: The dataset I decided to look at is called “UFO Sightings” and includes over 80,000 records of UFO sightings around the world. This dataset includes the UFO’s shape, the location of the sighting including the coordinates/country/state, the duration of the sighting, and the time of the sighting.

Link: https://corgis-edu.github.io/corgis/datasets/csv/ufo_sightings/ufo_sightings.csv

Motivation: My motivation for looking at this data is because I have an interest in conspiracy theories and extraterrestial beings and I would like to figure out whether there are patterns behind UFO sightings and what these patterns could mean.

Questions: * Is there a pattern among the most common times and location of UFO sightings? * Is there a common theme with the locations that have the most common UFO sightings? * Does the duration of the UFO sighting play an important role? * Do these patterns imply that these UFO sightings can be credible or debunked? * How have UFO sightings changed over time?

Methods

import pandas as pd
import matplotlib.pyplot as plt
import nltk
import requests
response=requests.get('https://corgis-edu.github.io/corgis/datasets/csv/ufo_sightings/ufo_sightings.csv')
response
<Response [200]>

Data Summary: The dataset includes the shape of the UFO, the location of the sighting (city and state), the duration of the encounter (in seconds), a short description of the sighting, coordinates of the sighting, and the time and date of the sighting (month, day, hour, minute, year), and the date that the sighting was documented. All of the data is numerical except for the description, shape, and location which is categorical.

url='https://corgis-edu.github.io/corgis/datasets/csv/ufo_sightings/ufo_sightings.csv'
df=pd.read_csv(url)
df.head(5)
Location.City Location.State Location.Country Data.Shape Data.Encounter duration Data.Description excerpt Location.Coordinates.Latitude Location.Coordinates.Longitude Dates.Sighted.Year Dates.Sighted.Month Date.Sighted.Day Dates.Sighted.Hour Dates.Sighted.Minute Dates.Documented.Year Dates.Documented.Month Dates.Documented.Day
0 anchor point AK US disk 300.0 Large UFO over Mt. ILIAMNA Alaska. ((NUFORC N... 59.776667 -151.831389 2005 5 24 18 30 2005 5 28
1 anchorage AK US changing 21600.0 We could observe red lights dancing across the... 61.218056 -149.900278 2000 12 31 21 0 2001 2 18
2 anchorage AK US changing 600.0 INTENSE AMBER-ORANGE HONEYCOMB SHAPED DUAL HOR... 61.218056 -149.900278 2006 10 23 21 3 2006 12 7
3 anchorage AK US cigar 15.0 I explained away the first time I thought I se... 61.218056 -149.900278 2014 3 29 20 45 2014 4 4
4 anchorage AK US circle 300.0 Orange circles &quot;climbing&quot; then fadin... 61.218056 -149.900278 2011 10 21 21 0 2011 10 25

Summary Statistics:

df.describe()
Data.Encounter duration Location.Coordinates.Latitude Location.Coordinates.Longitude Dates.Sighted.Year Dates.Sighted.Month Date.Sighted.Day Dates.Sighted.Hour Dates.Sighted.Minute Dates.Documented.Year Dates.Documented.Month Dates.Documented.Day
count 6.063200e+04 60632.000000 60632.000000 60632.000000 60632.000000 60632.000000 60632.000000 60632.000000 60632.000000 60632.000000 60632.000000
mean 5.410128e+03 38.311073 -95.584796 2004.447833 6.872658 15.026587 15.809094 17.718367 2007.401537 6.706063 15.229219
std 4.143867e+05 5.552705 18.025296 10.178389 3.249002 8.920703 7.537834 17.924455 4.480640 3.487636 8.789173
min 1.000000e-02 19.426944 -170.478889 1910.000000 1.000000 1.000000 0.000000 0.000000 1998.000000 1.000000 1.000000
25% 3.000000e+01 34.092222 -114.336667 2002.000000 4.000000 7.000000 11.000000 0.000000 2004.000000 4.000000 8.000000
50% 1.800000e+02 38.904306 -89.911111 2007.000000 7.000000 15.000000 19.000000 15.000000 2008.000000 7.000000 14.000000
75% 6.000000e+02 41.924583 -81.035000 2011.000000 10.000000 22.000000 21.000000 30.000000 2012.000000 10.000000 22.000000
max 6.627600e+07 70.636944 -66.984722 2014.000000 12.000000 31.000000 23.000000 59.000000 2014.000000 12.000000 31.000000
df.shape
(60632, 16)

Outlier Data: There don’t seem to be many outliers because the date and time data fit into what the min and max amount of months, years, days, and hours in the day there should be. However, there seems to be an outlier with the max encounter duration which is quite larger than the mean duration and possibly an unrealistic number because 6.627600e+07 seconds is over 600 days.

Data Preprocessing: I decided to only look at data in which the encounter duration was less than a day because any encounters longer than that could be a typo or it could mean multiple sightings over multiple days. Therefore, I dropped any of the rows that included an encounter duration of greater than 86,400 seconds.

# check column names
print(df.columns)
Index(['Location.City', 'Location.State', 'Location.Country', 'Data.Shape',
       'Data.Encounter duration', 'Data.Description excerpt',
       'Location.Coordinates.Latitude ', 'Location.Coordinates.Longitude ',
       'Dates.Sighted.Year', 'Dates.Sighted.Month', 'Date.Sighted.Day',
       'Dates.Sighted.Hour', 'Dates.Sighted.Minute', 'Dates.Documented.Year',
       'Dates.Documented.Month', 'Dates.Documented.Day'],
      dtype='object')
# drop any duplicate rows
df.drop_duplicates()
Location.City Location.State Location.Country Data.Shape Data.Encounter duration Data.Description excerpt Location.Coordinates.Latitude Location.Coordinates.Longitude Dates.Sighted.Year Dates.Sighted.Month Date.Sighted.Day Dates.Sighted.Hour Dates.Sighted.Minute Dates.Documented.Year Dates.Documented.Month Dates.Documented.Day
0 anchor point AK US disk 300.0 Large UFO over Mt. ILIAMNA Alaska. ((NUFORC N... 59.776667 -151.831389 2005 5 24 18 30 2005 5 28
1 anchorage AK US changing 21600.0 We could observe red lights dancing across the... 61.218056 -149.900278 2000 12 31 21 0 2001 2 18
2 anchorage AK US changing 600.0 INTENSE AMBER-ORANGE HONEYCOMB SHAPED DUAL HOR... 61.218056 -149.900278 2006 10 23 21 3 2006 12 7
3 anchorage AK US cigar 15.0 I explained away the first time I thought I se... 61.218056 -149.900278 2014 3 29 20 45 2014 4 4
4 anchorage AK US circle 300.0 Orange circles &quot;climbing&quot; then fadin... 61.218056 -149.900278 2011 10 21 21 0 2011 10 25
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
60627 sheridan WY US oval 20.0 blue-green bright oval was spotted 20 miles so... 44.797222 -106.955556 2002 9 6 21 0 2002 9 13
60628 thermopolis WY US unknown 15.0 UFO near Thermopolis WY 43.646111 -108.211389 2007 6 14 23 0 2007 8 7
60629 torrington WY US cigar 2.0 I was on a hill enjoying the sunset. I fell as... 42.065000 -104.181111 2011 11 5 21 30 2011 12 12
60630 worland WY US light 15.0 The object was a dim point of light that grew ... 44.016944 -107.954722 2003 6 17 22 42 2003 6 18
60631 worland WY US oval 2700.0 ((HOAX??)) My parents told me they saw this U... 44.016944 -107.954722 2008 2 15 5 0 2008 4 17

60630 rows × 16 columns

# remove outlier durations longer than a day (86400 seconds)
df.drop(df[df['Data.Encounter duration'] >= 86400].index, inplace = True)
df
Location.City Location.State Location.Country Data.Shape Data.Encounter duration Data.Description excerpt Location.Coordinates.Latitude Location.Coordinates.Longitude Dates.Sighted.Year Dates.Sighted.Month Date.Sighted.Day Dates.Sighted.Hour Dates.Sighted.Minute Dates.Documented.Year Dates.Documented.Month Dates.Documented.Day
0 anchor point AK US disk 300.0 Large UFO over Mt. ILIAMNA Alaska. ((NUFORC N... 59.776667 -151.831389 2005 5 24 18 30 2005 5 28
1 anchorage AK US changing 21600.0 We could observe red lights dancing across the... 61.218056 -149.900278 2000 12 31 21 0 2001 2 18
2 anchorage AK US changing 600.0 INTENSE AMBER-ORANGE HONEYCOMB SHAPED DUAL HOR... 61.218056 -149.900278 2006 10 23 21 3 2006 12 7
3 anchorage AK US cigar 15.0 I explained away the first time I thought I se... 61.218056 -149.900278 2014 3 29 20 45 2014 4 4
4 anchorage AK US circle 300.0 Orange circles &quot;climbing&quot; then fadin... 61.218056 -149.900278 2011 10 21 21 0 2011 10 25
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
60627 sheridan WY US oval 20.0 blue-green bright oval was spotted 20 miles so... 44.797222 -106.955556 2002 9 6 21 0 2002 9 13
60628 thermopolis WY US unknown 15.0 UFO near Thermopolis WY 43.646111 -108.211389 2007 6 14 23 0 2007 8 7
60629 torrington WY US cigar 2.0 I was on a hill enjoying the sunset. I fell as... 42.065000 -104.181111 2011 11 5 21 30 2011 12 12
60630 worland WY US light 15.0 The object was a dim point of light that grew ... 44.016944 -107.954722 2003 6 17 22 42 2003 6 18
60631 worland WY US oval 2700.0 ((HOAX??)) My parents told me they saw this U... 44.016944 -107.954722 2008 2 15 5 0 2008 4 17

60501 rows × 16 columns

# check if the largest encounter durations are less than 86400 seconds
df.nlargest(5, "Data.Encounter duration")
Location.City Location.State Location.Country Data.Shape Data.Encounter duration Data.Description excerpt Location.Coordinates.Latitude Location.Coordinates.Longitude Dates.Sighted.Year Dates.Sighted.Month Date.Sighted.Day Dates.Sighted.Hour Dates.Sighted.Minute Dates.Documented.Year Dates.Documented.Month Dates.Documented.Day
345 bessemer AL US unknown 73800.0 10/26/2011 To whom it may concern On or about ... 33.401667 -86.954444 1987 2 20 1 30 2011 12 12
2651 phoenix AZ US light 73800.0 Long Streaks of Light (that put me in mind of ... 33.448333 -112.073333 1998 2 25 3 0 1999 1 28
2921 prescott valley AZ US other 73800.0 3 then up to 6 white flashing lights move erra... 34.610000 -112.315000 2013 10 22 18 30 2013 11 11
9867 san fernando CA US triangle 73800.0 watch many ligth&#39s in the sky day&#39s befo... 34.281944 -118.438056 1998 8 15 17 30 2001 1 3
15522 jacksonville FL US triangle 73800.0 SILENT TRIANGLE SPARKLING LIGHTS INVISBLE CENTER. 30.331944 -81.655833 2010 5 6 20 0 2010 5 12

Results

Exploratory Data Visualisations

plt.hist(x=df['Dates.Sighted.Hour'], bins=24)
plt.title('Most Common Hour of UFO Sightings')
Text(0.5, 1.0, 'Most Common Hour of UFO Sightings')

First Exploratory Visualization: The first exploratory visualization I decided to create was the most common hour of UFO sightings. It appears that most UFO sightings peak around 9PM. The least amount of UFO sightings appear around 8AM. The distribution seems to increase when it gets darker and decrease when there is more sunlight.

import plotly.graph_objects as go

fig = go.Figure(data=go.Scattergeo(
        lon = df['Location.Coordinates.Longitude '],
        lat = df['Location.Coordinates.Latitude '],
        mode = 'markers'
        ))

fig.update_layout(
        title = 'Coordinations of UFO Sightings',
        geo_scope='usa',
    )

fig.show()

Second Exploratory Visualizations: The second exploratory data shows the locations of all of the coordinates of all of the UFO sightings over all of the years in the United States. The map shows that most of the UFO sightings recorded are on the east half of the United States and the West coast. The amount of UFO sightings is a lot less dense in the area between.

city=df['Location.City']
top15city=nltk.FreqDist(city).most_common(15)
top15city

for x,y in top15city:
    plt.barh(x,y)

plt.title("Top 15 Cities with UFO Sightings")
Text(0.5, 1.0, 'Top 15 Cities with UFO Sightings')

Third Exploratory Visualisation: The third exploratory visualization is a horizontal bar graph showing the top 15 cities based on amount of UFO sightings. The top 5 are big cities on the west coast of the United States which is interesting because in the previous map the dense parts of UFO sightings were mostly on the east half of the United States but these individual cities on the west coast hold the greatest number of UFO sightings.

shape=df['Data.Shape']
top15shapes=nltk.FreqDist(shape).most_common(15)
top15shapes

for x,y in top15shapes:
    plt.barh(x,y)

plt.title("Top 15 UFO Shape Sightings")
Text(0.5, 1.0, 'Top 15 UFO Shape Sightings')

Fourth Exploratory Visualisation: The fourth exploratory visualisation is also a horizontal bar graph showing the top 15 shapes of UFO’s in UFO sightings. Sightings of the ‘light’ shape is far greater than any of the other shapes having over 12,000 sightings while the other shapes have less than 6000 sightings.

Data Visualisations

# Analyze most common UFO shapes in one of the top 5 cities
# Analyze most common UFO shape in one of the cities with less sightings
# See if theres a correlation in UFO sightings over time
year=df['Dates.Documented.Year']
sightingsperyear=nltk.FreqDist(year)
sightingsperyear

for i in sightingsperyear:
    plt.scatter(i, sightingsperyear[i])

plt.title("UFO Sightings per Year")
Text(0.5, 1.0, 'UFO Sightings per Year')

First Visualization: The first visualization is a scatter plot showing the amount of UFO sightings per year. Looking at the plot, it seems that the amount of UFO sightings generally increases every year with the max amount of sightings in 2012 with over 6000 sightings and then decreasing again in 2014 which could be because there wasn’t a complete amount of data to account for the entire year of 2014.

# frequency of UFO encounter durations over the years
plt.hist(x=df['Data.Encounter duration'], bins=1000)
plt.xlim([0, 4000])
plt.title('Frequency of Encounter Duration of UFO Sightings')
Text(0.5, 1.0, 'Frequency of Encounter Duration of UFO Sightings')

Second Visualization: The second visualization is a histogram that shows which encounter duration frequency is the most common. The most frequent encounter duration is about 27,000 seconds which is 7 hours. The long encounter could mean multiple separate encounters over the course of 7 hours. The second most frequent encounter duration is much smaller though at around 7000 seconds which is about 2 hours.


seattle = df.loc[df['Location.City'] == "seattle"]
seattleshape=seattle['Data.Shape']
seattlecommonshapes=nltk.FreqDist(seattleshape).most_common(9)

lasvegas = df.loc[df['Location.City'] == "las vegas"]
lasvegasshape=lasvegas['Data.Shape']
lasvegascommonshapes=nltk.FreqDist(lasvegasshape).most_common(9)

la = df.loc[df['Location.City'] == "los angeles"]
lashape=la['Data.Shape']
lacommonshapes=nltk.FreqDist(lashape).most_common(9)

phoenix = df.loc[df['Location.City'] == "phoenix"]
phoenixshape=phoenix['Data.Shape']
phoenixcommonshapes=nltk.FreqDist(phoenixshape).most_common(9)
fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows=4, ncols=1,gridspec_kw={'hspace': 0.5},figsize=(9, 12))

for shape, count in lasvegascommonshapes:
    ax1.barh(shape, count)

ax1.set_title('Common Shapes in Las Vegas')
ax1.set_xlabel('Frequency')
ax1.set_ylabel('Shape')

for shape, count in seattlecommonshapes:
    ax2.barh(shape, count)

ax2.set_title('Common Shapes in Seattle')
ax2.set_xlabel('Frequency')
ax2.set_ylabel('Shape')

for shape, count in lacommonshapes:
    ax3.barh(shape, count)

ax3.set_title('Common Shapes in Los Angeles')
ax3.set_xlabel('Frequency')
ax3.set_ylabel('Shape')

for shape, count in phoenixcommonshapes:
    ax4.barh(shape, count)

ax4.set_title('Common Shapes in Phoenix')
ax4.set_xlabel('Frequency')
ax4.set_ylabel('Shape')


Text(0, 0.5, 'Shape')

Third Visualization: The third visualization looks closer at the top cities with the most common UFO sightings to see what their most common UFO shapes were. In this visalization, we look at Las Vegas, Los Angeles, Phoenix, and Seattle. We can see that forth all of the plots, the most common UFO shape that was sighted was the “light” shape which exceeds the other shapes by far.

bottom15city = nltk.FreqDist(city).most_common()[-8000:]
bottom15city
[('dixon', 11),
 ('felton', 11),
 ('glendora', 11),
 ('grapevine', 11),
 ('healdsburg', 11),
 ('hillsborough', 11),
 ('huron', 11),
 ('laguna beach', 11),
 ('manteca', 11),
 ('mojave', 11),
 ('orangevale', 11),
 ('paso robles', 11),
 ('placentia', 11),
 ('rancho santa margarita', 11),
 ('red bluff', 11),
 ('san ramon', 11),
 ('three rivers', 11),
 ('weaverville', 11),
 ('delta', 11),
 ('victor', 11),
 ('branford', 11),
 ('darien', 11),
 ('derby', 11),
 ('hamden', 11),
 ('bear', 11),
 ('bonita springs', 11),
 ('brooksville', 11),
 ('crestview', 11),
 ('inverness', 11),
 ('okeechobee', 11),
 ('pinellas park', 11),
 ('royal palm beach', 11),
 ('winter springs', 11),
 ('mcdonough', 11),
 ('ringgold', 11),
 ('temple', 11),
 ('thomasville', 11),
 ('hilo', 11),
 ('lahaina', 11),
 ('dubuque', 11),
 ('ottumwa', 11),
 ('middleton', 11),
 ('alton', 11),
 ('champaign', 11),
 ('crestwood', 11),
 ('deerfield', 11),
 ('gurnee', 11),
 ('lemont', 11),
 ('normal', 11),
 ('ottawa', 11),
 ('riverton', 11),
 ('schaumburg', 11),
 ('south elgin', 11),
 ('tuscola', 11),
 ('urbana', 11),
 ('fishers', 11),
 ('michigan city', 11),
 ('vincennes', 11),
 ('cameron', 11),
 ('denham springs', 11),
 ('attleboro', 11),
 ('fitchburg', 11),
 ('methuen', 11),
 ('northampton', 11),
 ('norwood', 11),
 ('revere', 11),
 ('rockport', 11),
 ('shrewsbury', 11),
 ('williamstown', 11),
 ('ellicott city', 11),
 ('gorham', 11),
 ('scarborough', 11),
 ('new brighton', 11),
 ('jefferson city', 11),
 ('kearney', 11),
 ('willard', 11),
 ('williamsville', 11),
 ('carolina beach', 11),
 ('chapel hill', 11),
 ('hampstead', 11),
 ('kernersville', 11),
 ('kill devil hills', 11),
 ('kure beach', 11),
 ('smithfield', 11),
 ('wrightsville beach', 11),
 ('clark', 11),
 ('hopewell', 11),
 ('sicklerville', 11),
 ('commack', 11),
 ('niagara falls', 11),
 ('beavercreek', 11),
 ('delaware', 11),
 ('scappoose', 11),
 ('indiana', 11),
 ('phoenixville', 11),
 ('stroudsburg', 11),
 ('spartanburg', 11),
 ('goodlettsville', 11),
 ('greeneville', 11),
 ('alvin', 11),
 ('laredo', 11),
 ('new braunfels', 11),
 ('lehi', 11),
 ('moab', 11),
 ('tooele', 11),
 ('east wenatchee', 11),
 ('poulsbo', 11),
 ('sammamish', 11),
 ('wisconsin dells', 11),
 ('gillette', 11),
 ('eagle river', 10),
 ('addison', 10),
 ('ozark', 10),
 ('junction city', 10),
 ('sherwood', 10),
 ('douglas', 10),
 ('duncan', 10),
 ('fountain hills', 10),
 ('payson', 10),
 ('boonville', 10),
 ('canby', 10),
 ('cathedral city', 10),
 ('lemoore', 10),
 ('lompoc', 10),
 ('los alamos', 10),
 ('madera', 10),
 ('newbury park', 10),
 ('northridge', 10),
 ('oak park', 10),
 ('paradise', 10),
 ('ramona', 10),
 ('salida', 10),
 ('san jacinto', 10),
 ('susanville', 10),
 ('tehachapi', 10),
 ('westwood', 10),
 ('breckenridge', 10),
 ('norwich', 10),
 ('harrington', 10),
 ('cocoa beach', 10),
 ('holiday', 10),
 ('jensen beach', 10),
 ('lantana', 10),
 ('melbourne beach', 10),
 ('tarpon springs', 10),
 ('acworth', 10),
 ('douglasville', 10),
 ('maysville', 10),
 ('kaneohe', 10),
 ('muscatine', 10),
 ('west des moines', 10),
 ('bourbonnais', 10),
 ('crete', 10),
 ('loves park', 10),
 ('martinsville', 10),
 ('spring grove', 10),
 ('texas city', 10),
 ('waukegan', 10),
 ('willow springs', 10),
 ('albion', 10),
 ('bennington', 10),
 ('brookston', 10),
 ('connersville', 10),
 ('corydon', 10),
 ('logansport', 10),
 ('warsaw', 10),
 ('beloit', 10),
 ('louisburg', 10),
 ('falmouth', 10),
 ('hopkinsville', 10),
 ('perryville', 10),
 ('sparta', 10),
 ('verona', 10),
 ('holden', 10),
 ('metairie', 10),
 ('beverly', 10),
 ('haverhill', 10),
 ('holyoke', 10),
 ('randolph', 10),
 ('southampton', 10),
 ('wayland', 10),
 ('holly', 10),
 ('lapeer', 10),
 ('madison heights', 10),
 ('novi', 10),
 ('chaska', 10),
 ('bolivar', 10),
 ('lumberton', 10),
 ('meadville', 10),
 ('plains', 10),
 ('new bern', 10),
 ('elkhorn', 10),
 ('hershey', 10),
 ('papillion', 10),
 ('cape may', 10),
 ('paterson', 10),
 ('piscataway', 10),
 ('pleasantville', 10),
 ('ashville', 10),
 ('bethpage', 10),
 ('north tonawanda', 10),
 ('plainview', 10),
 ('ronkonkoma', 10),
 ('vestal', 10),
 ('euclid', 10),
 ('lorain', 10),
 ('claremore', 10),
 ('yukon', 10),
 ('tualatin', 10),
 ('enola', 10),
 ('gettysburg', 10),
 ('aiken', 10),
 ('fort mill', 10),
 ('yankton', 10),
 ('copperas cove', 10),
 ('nacogdoches', 10),
 ('pearland', 10),
 ('the woodlands', 10),
 ('waxahachie', 10),
 ('layton', 10),
 ('front royal', 10),
 ('camas', 10),
 ('lynden', 10),
 ('port townsend', 10),
 ('snoqualmie', 10),
 ('university place', 10),
 ('lake geneva', 10),
 ('manitowoc', 10),
 ('neenah', 10),
 ('new berlin', 10),
 ('river falls', 10),
 ('cordova', 9),
 ('bradford', 9),
 ('chelsea', 9),
 ('cullman', 9),
 ('oneonta', 9),
 ('alma', 9),
 ('bentonville', 9),
 ('damascus', 9),
 ('osceola', 9),
 ('pocahontas', 9),
 ('benson', 9),
 ('globe', 9),
 ('holbrook', 9),
 ('sun city', 9),
 ('vail', 9),
 ('acton', 9),
 ('calabasas', 9),
 ('chino hills', 9),
 ('encino', 9),
 ('hermosa beach', 9),
 ('imperial beach', 9),
 ('la mirada', 9),
 ('la puente', 9),
 ('manhattan beach', 9),
 ('mariposa', 9),
 ('menifee', 9),
 ('mill valley', 9),
 ('perris', 9),
 ('san juan capistrano', 9),
 ('south gate', 9),
 ('south lake tahoe', 9),
 ('stanton', 9),
 ('temple city', 9),
 ('truckee', 9),
 ('watsonville', 9),
 ('west los angeles', 9),
 ('wildomar', 9),
 ('willits', 9),
 ('durango', 9),
 ('oak creek', 9),
 ('brooklyn', 9),
 ('cromwell', 9),
 ('new britain', 9),
 ('new milford', 9),
 ('southbury', 9),
 ('southport', 9),
 ('west hartford', 9),
 ('westport', 9),
 ('wethersfield', 9),
 ('altamonte springs', 9),
 ('baldwin', 9),
 ('dade city', 9),
 ('dunedin', 9),
 ('fort pierce', 9),
 ('hernando', 9),
 ('hobe sound', 9),
 ('marathon', 9),
 ('new smyrna beach', 9),
 ('palm city', 9),
 ('saint cloud', 9),
 ('sebring', 9),
 ('valrico', 9),
 ('wesley chapel', 9),
 ('wildwood', 9),
 ('byron', 9),
 ('dalton', 9),
 ('dawsonville', 9),
 ('flowery branch', 9),
 ('lincolnton', 9),
 ('mitchell', 9),
 ('suwanee', 9),
 ('carlisle', 9),
 ('clear lake', 9),
 ('urbandale', 9),
 ('moscow', 9),
 ('girard', 9),
 ('lake in the hills', 9),
 ('leland', 9),
 ('lewistown', 9),
 ('murphysboro', 9),
 ('park ridge', 9),
 ('pontiac', 9),
 ('streamwood', 9),
 ('wadsworth', 9),
 ('wilsonville', 9),
 ('winfield', 9),
 ('columbia city', 9),
 ('gary', 9),
 ('yorktown', 9),
 ('park city', 9),
 ('phillipsburg', 9),
 ('corinth', 9),
 ('pineville', 9),
 ('russell springs', 9),
 ('gonzales', 9),
 ('kenner', 9),
 ('chicopee', 9),
 ('fall river', 9),
 ('hopkinton', 9),
 ('marlboro', 9),
 ('north attleboro', 9),
 ('provincetown', 9),
 ('winthrop', 9),
 ('edgewood', 9),
 ('glen burnie', 9),
 ('nottingham', 9),
 ('parkville', 9),
 ('towson', 9),
 ('palmyra', 9),
 ('poland', 9),
 ('adrian', 9),
 ('dearborn heights', 9),
 ('flushing', 9),
 ('royal oak', 9),
 ('southfield', 9),
 ('brainerd', 9),
 ('coon rapids', 9),
 ('eden prairie', 9),
 ('ely', 9),
 ('minnetonka', 9),
 ('shoreview', 9),
 ('neosho', 9),
 ('nevada', 9),
 ('biloxi', 9),
 ('durant', 9),
 ('ripley', 9),
 ('apex', 9),
 ('cornelius', 9),
 ('garner', 9),
 ('huntersville', 9),
 ('waxhaw', 9),
 ('grand forks', 9),
 ('brady', 9),
 ('bayonne', 9),
 ('beachwood', 9),
 ('cherry hill', 9),
 ('new brunswick', 9),
 ('point pleasant', 9),
 ('alamogordo', 9),
 ('fallon', 9),
 ('dundee', 9),
 ('fairport', 9),
 ('liverpool', 9),
 ('riverhead', 9),
 ('boardman', 9),
 ('cuyahoga falls', 9),
 ('findlay', 9),
 ('kettering', 9),
 ('powell', 9),
 ('strongsville', 9),
 ('wooster', 9),
 ('bartlesville', 9),
 ('ponca city', 9),
 ('poteau', 9),
 ('weatherford', 9),
 ('baker city', 9),
 ('keizer', 9),
 ('chambersburg', 9),
 ('langhorne', 9),
 ('lansdale', 9),
 ('hixson', 9),
 ('sevierville', 9),
 ('bertram', 9),
 ('cedar park', 9),
 ('humble', 9),
 ('keller', 9),
 ('league city', 9),
 ('stephenville', 9),
 ('logan', 9),
 ('magna', 9),
 ('vernal', 9),
 ('burke', 9),
 ('hoquiam', 9),
 ('mercer island', 9),
 ('orting', 9),
 ('selah', 9),
 ('sheboygan', 9),
 ('beckley', 9),
 ('homer', 8),
 ('kenai', 8),
 ('crossville', 8),
 ('eldridge', 8),
 ('enterprise', 8),
 ('foley', 8),
 ('gadsden', 8),
 ('new hope', 8),
 ('selma', 8),
 ('arden', 8),
 ('glenwood', 8),
 ('pottsville', 8),
 ('san carlos', 8),
 ('tonopah', 8),
 ('aliso viejo', 8),
 ('antelope', 8),
 ('benicia', 8),
 ('bishop', 8),
 ('blythe', 8),
 ('canyon country', 8),
 ('ceres', 8),
 ('commerce', 8),
 ('compton', 8),
 ('corning', 8),
 ('desert hot springs', 8),
 ('diamond bar', 8),
 ('el cerrito', 8),
 ('la habra', 8),
 ('la quinta', 8),
 ('live oak', 8),
 ('los gatos', 8),
 ('orinda', 8),
 ('penn valley', 8),
 ('san dimas', 8),
 ('south pasadena', 8),
 ('south san francisco', 8),
 ('sun valley', 8),
 ('trinidad', 8),
 ('valley springs', 8),
 ('eagle', 8),
 ('evergreen', 8),
 ('morrison', 8),
 ('rifle', 8),
 ('dayville', 8),
 ('guilford', 8),
 ('marlborough', 8),
 ('southington', 8),
 ('vernon', 8),
 ('atlantic beach', 8),
 ('aventura', 8),
 ('key largo', 8),
 ('loxahatchee', 8),
 ('margate', 8),
 ('oakland park', 8),
 ('orange park', 8),
 ('polk city', 8),
 ('sebastian', 8),
 ('bloomingdale', 8),
 ('cairo', 8),
 ('gray', 8),
 ('grayson', 8),
 ('lithia springs', 8),
 ('morrow', 8),
 ('newnan', 8),
 ('reidsville', 8),
 ('stone mountain', 8),
 ('sylvania', 8),
 ('warner robins', 8),
 ('whitesburg', 8),
 ('keaau', 8),
 ('ainsworth', 8),
 ('dorchester', 8),
 ('fort dodge', 8),
 ('granville', 8),
 ('woodward', 8),
 ('montpelier', 8),
 ('pocatello', 8),
 ('des plaines', 8),
 ('granite city', 8),
 ('hinsdale', 8),
 ('hoffman estates', 8),
 ('ingleside', 8),
 ('new lenox', 8),
 ('olney', 8),
 ('oregon', 8),
 ('vandalia', 8),
 ('westmont', 8),
 ('angola', 8),
 ('charlestown', 8),
 ('lagrange', 8),
 ('lawrenceburg', 8),
 ('pendleton', 8),
 ('tipton', 8),
 ('marquette', 8),
 ('mission', 8),
 ('bardstown', 8),
 ('london', 8),
 ('louisa', 8),
 ('nicholasville', 8),
 ('scottsville', 8),
 ('agawam', 8),
 ('berkley', 8),
 ('framingham', 8),
 ('marstons mills', 8),
 ('stoughton', 8),
 ('friendship', 8),
 ('north east', 8),
 ('owings mills', 8),
 ('belgrade', 8),
 ('farmingdale', 8),
 ('otis', 8),
 ('raymond', 8),
 ('wells', 8),
 ('burton', 8),
 ('carson city', 8),
 ('east lansing', 8),
 ('fostoria', 8),
 ('grand blanc', 8),
 ('linden', 8),
 ('edina', 8),
 ('fairmont', 8),
 ('mankato', 8),
 ('maple grove', 8),
 ('mcgregor', 8),
 ('richfield', 8),
 ('branson', 8),
 ('ironton', 8),
 ('osage beach', 8),
 ('raymore', 8),
 ('brookhaven', 8),
 ('ocean springs', 8),
 ('wolf point', 8),
 ('goldsboro', 8),
 ('highlands', 8),
 ('mebane', 8),
 ('keene', 8),
 ('bergenfield', 8),
 ('east brunswick', 8),
 ('mahwah', 8),
 ('parsippany', 8),
 ('sayreville', 8),
 ('sweetwater', 8),
 ('hobbs', 8),
 ('socorro', 8),
 ('babylon', 8),
 ('cheektowaga', 8),
 ('endicott', 8),
 ('hicksville', 8),
 ('hyde park', 8),
 ('jamaica', 8),
 ('pine bush', 8),
 ('white plains', 8),
 ('christiansburg', 8),
 ('maumee', 8),
 ('south point', 8),
 ('west manchester', 8),
 ('enid', 8),
 ('clackamas', 8),
 ('hermiston', 8),
 ('hood river', 8),
 ('blacksburg', 8),
 ('easley', 8),
 ('little river', 8),
 ('dandridge', 8),
 ('dyersburg', 8),
 ('cleburne', 8),
 ('del rio', 8),
 ('hurst', 8),
 ('the colony', 8),
 ('bountiful', 8),
 ('green river', 8),
 ('roy', 8),
 ('luray', 8),
 ('spotsylvania', 8),
 ('suffolk', 8),
 ('chehalis', 8),
 ('duvall', 8),
 ('friday harbor', 8),
 ('gold bar', 8),
 ('pullman', 8),
 ('baraboo', 8),
 ('cedarburg', 8),
 ('plover', 8),
 ('sturgeon bay', 8),
 ('wautoma', 8),
 ('ketchikan', 7),
 ('seward', 7),
 ('anniston', 7),
 ('gardendale', 7),
 ('orange beach', 7),
 ('piedmont', 7),
 ('cabot', 7),
 ('garfield', 7),
 ('arizona city', 7),
 ('camp verde', 7),
 ('paradise valley', 7),
 ('anaheim hills', 7),
 ('banning', 7),
 ('big sur', 7),
 ('borrego springs', 7),
 ('burlingame', 7),
 ('cadiz', 7),
 ('cupertino', 7),
 ('daly city', 7),
 ('del mar', 7),
 ('desert center', 7),
 ('el monte', 7),
 ('el segundo', 7),
 ('foster city', 7),
 ('gardena', 7),
 ('grenada', 7),
 ('grover beach', 7),
 ('hanford', 7),
 ('joshua tree', 7),
 ('la crescenta', 7),
 ('lake arrowhead', 7),
 ('nipomo', 7),
 ('orange county', 7),
 ('panorama city', 7),
 ('paramount', 7),
 ('rancho cordova', 7),
 ('rosemead', 7),
 ('san anselmo', 7),
 ('san bruno', 7),
 ('san fernando', 7),
 ('sylmar', 7),
 ('west hollywood', 7),
 ('west sacramento', 7),
 ('yucca valley', 7),
 ('bayfield', 7),
 ('commerce city', 7),
 ('dillon', 7),
 ('nederland', 7),
 ('platteville', 7),
 ('sedalia', 7),
 ('strasburg', 7),
 ('east haven', 7),
 ('hebron', 7),
 ('new london', 7),
 ('oakville', 7),
 ('prospect', 7),
 ('westville', 7),
 ('anthony', 7),
 ('casselberry', 7),
 ('crystal beach', 7),
 ('kendall', 7),
 ('north miami', 7),
 ('north miami beach', 7),
 ('oldsmar', 7),
 ('parkland', 7),
 ('parrish', 7),
 ('port orange', 7),
 ('rockledge', 7),
 ('tamarac', 7),
 ('college park', 7),
 ('dahlonega', 7),
 ('evans', 7),
 ('midway', 7),
 ('moultrie', 7),
 ('norcross', 7),
 ('oakwood', 7),
 ('st. simons island', 7),
 ('sycamore', 7),
 ('villa rica', 7),
 ('kahului', 7),
 ('kapaa', 7),
 ('mason city', 7),
 ('solon', 7),
 ('blackfoot', 7),
 ('emmett', 7),
 ('midvale', 7),
 ('rexburg', 7),
 ('algonquin', 7),
 ('chicago heights', 7),
 ('dekalb', 7),
 ('edinburg', 7),
 ('hanover park', 7),
 ('harvard', 7),
 ('lake villa', 7),
 ('manteno', 7),
 ('mundelein', 7),
 ('seneca', 7),
 ('woodridge', 7),
 ('brazil', 7),
 ('wabash', 7),
 ('walton', 7),
 ('west lafayette', 7),
 ('zionsville', 7),
 ('emporia', 7),
 ('norton', 7),
 ('mayfield', 7),
 ('millersburg', 7),
 ('pikeville', 7),
 ('blanchard', 7),
 ('houma', 7),
 ('brookline', 7),
 ('malden', 7),
 ('shirley', 7),
 ('south yarmouth', 7),
 ('southbridge', 7),
 ('waltham', 7),
 ('westford', 7),
 ('weymouth', 7),
 ('dundalk', 7),
 ('fort washington', 7),
 ('severn', 7),
 ('sykesville', 7),
 ('ellsworth', 7),
 ('presque isle', 7),
 ('attica', 7),
 ('davison', 7),
 ('flat rock', 7),
 ('gaylord', 7),
 ('kentwood', 7),
 ('south lyon', 7),
 ('sturgis', 7),
 ('taylor', 7),
 ('crosby', 7),
 ('inver grove heights', 7),
 ('little falls', 7),
 ('cuba', 7),
 ('starkville', 7),
 ('conover', 7),
 ('havelock', 7),
 ('statesville', 7),
 ('surf city', 7),
 ('hooksett', 7),
 ('windham', 7),
 ('asbury park', 7),
 ('keansburg', 7),
 ('marlton', 7),
 ('perth amboy', 7),
 ('ridgewood', 7),
 ('sewell', 7),
 ('gallup', 7),
 ('tularosa', 7),
 ('winnemucca', 7),
 ('bayside', 7),
 ('coram', 7),
 ('cortland', 7),
 ('forest hills', 7),
 ('herkimer', 7),
 ('hudson falls', 7),
 ('jericho', 7),
 ('kings park', 7),
 ('lake george', 7),
 ('massapequa', 7),
 ('montauk', 7),
 ('peekskill', 7),
 ('perrysburg', 7),
 ('port washington', 7),
 ('sag harbor', 7),
 ('saratoga springs', 7),
 ('selden', 7),
 ('smithtown', 7),
 ('sunnyside', 7),
 ('amelia', 7),
 ('barberton', 7),
 ('brook park', 7),
 ('celina', 7),
 ('kilgore', 7),
 ('miamisburg', 7),
 ('north royalton', 7),
 ('peebles', 7),
 ('piqua', 7),
 ('port clinton', 7),
 ('reynoldsburg', 7),
 ('sunbury', 7),
 ('wapakoneta', 7),
 ('el reno', 7),
 ('muskogee', 7),
 ('sapulpa', 7),
 ('christmas valley', 7),
 ('rogue river', 7),
 ('bensalem', 7),
 ('bethel park', 7),
 ('conshohocken', 7),
 ('edinboro', 7),
 ('fairless hills', 7),
 ('norristown', 7),
 ('philipsburg', 7),
 ('yardley', 7),
 ('narragansett', 7),
 ('west warwick', 7),
 ('woonsocket', 7),
 ('ladson', 7),
 ('north charleston', 7),
 ('taylors', 7),
 ('travelers rest', 7),
 ('eagle butte', 7),
 ('elizabethton', 7),
 ('pigeon forge', 7),
 ('tullahoma', 7),
 ('alice', 7),
 ('brownwood', 7),
 ('euless', 7),
 ('palestine', 7),
 ('southlake', 7),
 ('draper', 7),
 ('kearns', 7),
 ('spanish fork', 7),
 ('culpeper', 7),
 ('reston', 7),
 ('staunton', 7),
 ('langley', 7),
 ('sultan', 7),
 ('west seattle', 7),
 ('menomonie', 7),
 ('oconomowoc', 7),
 ('tomah', 7),
 ('moundsville', 7),
 ('petersburg', 6),
 ('adamsville', 6),
 ('alabaster', 6),
 ('bremen', 6),
 ('daphne', 6),
 ('fort morgan', 6),
 ('springville', 6),
 ('bryant', 6),
 ('elkins', 6),
 ('glen rose', 6),
 ('malvern', 6),
 ('searcy', 6),
 ('van buren', 6),
 ('chino valley', 6),
 ('jerome', 6),
 ('kearny', 6),
 ('agoura hills', 6),
 ('alamo', 6),
 ('atascadero', 6),
 ('calistoga', 6),
 ('canyon', 6),
 ('cedarville', 6),
 ('cerritos', 6),
 ('clearlake', 6),
 ('collegeville', 6),
 ('coloma', 6),
 ('duarte', 6),
 ('granada hills', 6),
 ('half moon bay', 6),
 ('inglewood', 6),
 ('lone pine', 6),
 ('ludlow', 6),
 ('mammoth lakes', 6),
 ('marina del rey', 6),
 ('monterey park', 6),
 ('moorpark', 6),
 ('mount shasta', 6),
 ('nevada city', 6),
 ('north highlands', 6),
 ('pacifica', 6),
 ('pollock pines', 6),
 ('rancho mirage', 6),
 ('rosamond', 6),
 ('santa paula', 6),
 ('shasta lake', 6),
 ('vinton', 6),
 ('westlake village', 6),
 ('wilton', 6),
 ('yreka', 6),
 ('bailey', 6),
 ('centennial', 6),
 ('cortez', 6),
 ('estes park', 6),
 ('falcon', 6),
 ('northglenn', 6),
 ('ovid', 6),
 ('steamboat springs', 6),
 ('wheat ridge', 6),
 ('cheshire', 6),
 ('glastonbury', 6),
 ('naugatuck', 6),
 ('north haven', 6),
 ('old lyme', 6),
 ('old saybrook', 6),
 ('somers', 6),
 ('stonington', 6),
 ('winsted', 6),
 ('elsmere', 6),
 ('rehoboth beach', 6),
 ('townsend', 6),
 ('dunnellon', 6),
 ('indialantic', 6),
 ('indian rocks beach', 6),
 ('lake placid', 6),
 ('lutz', 6),
 ('middleburg', 6),
 ('palm beach', 6),
 ('ponte vedra beach', 6),
 ('shalimar', 6),
 ('silver springs', 6),
 ('tavares', 6),
 ('valparaiso', 6),
 ('warrington', 6),
 ('bainbridge', 6),
 ('calhoun', 6),
 ('grovetown', 6),
 ('martin', 6),
 ('pembroke', 6),
 ('powder springs', 6),
 ('haleiwa', 6),
 ('lihue', 6),
 ('wahiawa', 6),
 ('clarion', 6),
 ('de soto', 6),
 ('keokuk', 6),
 ('preston', 6),
 ('burley', 6),
 ('beecher', 6),
 ('berwyn', 6),
 ('blue island', 6),
 ('carpentersville', 6),
 ('caseyville', 6),
 ('durand', 6),
 ('jerseyville', 6),
 ('lisle', 6),
 ('machesney park', 6),
 ('matteson', 6),
 ('mclean', 6),
 ('potomac', 6),
 ('ridgway', 6),
 ('ringwood', 6),
 ('rushville', 6),
 ('sandwich', 6),
 ('shorewood', 6),
 ('sugar grove', 6),
 ('sullivan', 6),
 ('summit', 6),
 ('taylorville', 6),
 ('vernon hills', 6),
 ('wauconda', 6),
 ('worth', 6),
 ('chesterton', 6),
 ('coatesville', 6),
 ('kendallville', 6),
 ('merrillville', 6),
 ('pittsboro', 6),
 ('rising sun', 6),
 ('chanute', 6),
 ('colby', 6),
 ('hays', 6),
 ('latham', 6),
 ('pratt', 6),
 ('wellsville', 6),
 ('berea', 6),
 ('hazard', 6),
 ('abbeville', 6),
 ('gretna', 6),
 ('minden', 6),
 ('sulphur', 6),
 ('west monroe', 6),
 ('youngsville', 6),
 ('billerica', 6),
 ('braintree', 6),
 ('chelmsford', 6),
 ('foxboro', 6),
 ('heath', 6),
 ('hyannis', 6),
 ('leominster', 6),
 ('raynham', 6),
 ('stow', 6),
 ('whitman', 6),
 ('yarmouth', 6),
 ('mt. airy', 6),
 ('bar harbor', 6),
 ('old orchard beach', 6),
 ('saco', 6),
 ('skowhegan', 6),
 ('turner', 6),
 ('algonac', 6),
 ('forest grove', 6),
 ('hudsonville', 6),
 ('iron mountain', 6),
 ('lake orion', 6),
 ('mayville', 6),
 ('petoskey', 6),
 ('saline', 6),
 ('fergus falls', 6),
 ('jordan', 6),
 ('owatonna', 6),
 ('victoria', 6),
 ('white bear lake', 6),
 ('ballwin', 6),
 ('bridgeton', 6),
 ('center', 6),
 ('festus', 6),
 ('hazelwood', 6),
 ('millersville', 6),
 ('passaic', 6),
 ('sikeston', 6),
 ('natchez', 6),
 ...]

lagunacity = df.loc[df['Location.City'] == "laguna beach"]
lagunashape=lagunacity['Data.Shape']
lagunacommonshapes=nltk.FreqDist(lagunashape).most_common(15)


glendora = df.loc[df['Location.City'] == "glendora"]
glendorashape=glendora['Data.Shape']
glendoracommonshapes=nltk.FreqDist(glendorashape).most_common(15)

michigan = df.loc[df['Location.City'] == "michigan city"]
michiganshape=michigan['Data.Shape']
michigancommonshapes=nltk.FreqDist(michiganshape).most_common(15)

niagara = df.loc[df['Location.City'] == "niagara falls"]
niagarashape=niagara['Data.Shape']
niagaracommonshapes=nltk.FreqDist(niagarashape).most_common(15)
fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows=4, ncols=1,gridspec_kw={'hspace': 0.5},figsize=(9, 12))


for shape, count in lagunacommonshapes:
    ax1.barh(shape, count)

ax1.set_title('Common Shapes in Laguna')
ax1.set_xlabel('Frequency')
ax1.set_ylabel('Shape')

for shape, count in glendoracommonshapes:
    ax2.barh(shape, count)

ax2.set_title('Common Shapes in Glendora')
ax2.set_xlabel('Frequency')
ax2.set_ylabel('Shape')

for shape, count in michigancommonshapes:
    ax3.barh(shape, count)

ax3.set_title('Common Shapes in Michigan City')
ax3.set_xlabel('Frequency')
ax3.set_ylabel('Shape')

for shape, count in niagaracommonshapes:
    ax4.barh(shape, count)

ax4.set_title('Common Shapes in Niagara Falls')
ax4.set_xlabel('Frequency')
ax4.set_ylabel('Shape')
Text(0, 0.5, 'Shape')

Fourth Visualization: The fourth visualization looks at specific cities with low amounts of UFO sightings to see what their most common UFO sightings shapes were. In this figure, we look at Laguna Beach and Glendora. For Laguna Beach, we can see that the most common shape is a “circle” which is 3 times more than Laguna Beach’s other most common shape sightings. For Glendora, we can see that the most common shape is tied between the “light”, “disk”, and “circle” which is only about 2 times more than Glendora’s most common shapes. Michigan has the “sphere” shape as the most common UFO sighting which is 2 times more than other shape sightings. Finally, Niagara Falls is tied with “formation” and “circle” being the most common shapes which is twice as much as the other most common shape sightings.

# change in frequency of each UFO shape over time
filtered_df = df.loc[(df['Dates.Sighted.Year'] >= 1997) & (df['Dates.Sighted.Year'] <= 2014)]
grouped_data = filtered_df.groupby(['Dates.Sighted.Year', 'Data.Shape']).size().unstack()
grouped_data.plot.bar(stacked=True, colormap='tab20')
plt.legend(title='Shape',bbox_to_anchor=(1.5,1))
plt.title('Shape Frequencies over the Years')
Text(0.5, 1.0, 'Shape Frequencies over the Years')

Discussion

The big ideas that my analysis has shown is that UFO sightings have increased over the years, have mostly been sightings of light shapes, and are most common in large urban cities. It also seems that the cities with the greatest number of UFO sightings also have ‘light’ as their most common shapes whereas cities with the least number of UFO sightings have various other shapes besides ‘light’ as their most common shapes.

I noticed that the top cities with the greatest number of UFO sightings such as Las Vegas, Seattle, Los Angeles, and Phoenix are all large urban cities with a lot of infrastructure such as skyscrapers and a lively city at night compared to cities that mostly consist of the suburbs. This could be an explanation for why

Sources